Peptide classification using optimal and information theoretic syntactic modeling
نویسندگان
چکیده
We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model of Oommen and Kashyap [22] applicable for syntactic pattern recognition can be used to tackle peptide classification problem. We advocate that one can model the differences between compared strings as a mutation model consisting of random substitutions, insertions and deletions obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a support vector machine (SVM)-based peptide classifier can be devised. The classifier, which we have built has been tested for eight different substitution matrices and for two different data sets, namely, the HIV-1 Protease cleavage sites and the T-cell epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman–Wunsch sequence alignment score, it is less sensitive to the substitution matrix than the other methods compared, and that when combined with a SVM, is among the best peptide classification methods available. & 2010 Elsevier Ltd. All rights reserved.
منابع مشابه
On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification
Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of biosequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classificati...
متن کاملModeling gene regulatory networks: Classical models, optimal perturbation for identification of network
Deep understanding of molecular biology has allowed emergence of new technologies like DNA decryption. On the other hand, advancements of molecular biology have made manipulation of genetic systems simpler than ever; this promises extraordinary progress in biological, medical and biotechnological applications. This is not an unrealistic goal since genes which are regulated by gene regulatory ...
متن کاملA game Theoretic Approach to Pricing, Advertising and Collection Decisions adjustment in a closed-loop supply chain
This paper considers advertising, collection and pricing decisions simultaneously for a closed-loop supplychain(CLSC) with one manufacturer(he) and two retailers(she). A multiplicatively separable new demand function is proposed which influenced by pricing and advertising. In this paper, three well-known scenarios in the game theory including the Nash, Stackelberg and Cooperative games are expl...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملOptimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors
In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, M*, for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A* the set of words over A, we specify a stochastically consistent scheme by whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 43 شماره
صفحات -
تاریخ انتشار 2010